How To Add Two 16 Bit RGB565 Pixels Together Nicely

Rawhed / Sensory Overload

Right, I'm assuming you've dealt with 16 bit color before, so I won't explain the basics. A very cool thing you can do with 16 bit color is layering. Yes, just like Photoshop. Layering and transparency are very cool effects, but are actually very processor intensive as you have to separate the red, green and blue value from each pixel color. So for example, here is a standard transparency routine:

        mov ax,[edi]    ;getpixel from buf1
        mov bx,[esi]    ;getpixel from buf2
        mov cx,ax       ;save them--
        mov dx,bx       ;save them--could also use stack I know...

                        ;now separate into RGB
                        ;RED
        shr cx,11
        shr dx,11
        add cx,dx
        shr cx,1        ;/2 for transparency
        shl cx,11
        push cx         ;runout of registers...so stack it

                        ;GREEN
        and cx,0000011111100000b
        and dx,0000011111100000b
        shr cx,5
        shr dx,5
        add cx,dx
        shr cx,1        ;/2 for transparency
        shl cx,5
        pop dx
        add cx,dx
        push cx         ;runout of registers...so stack it

                        ;BLUE
        and cx,0000000000011111b
        and dx,0000000000011111b
        add cx,dx
        shr cx,1        ;/2 for transparency
        pop dx
        add cx,dx

        mov [edi],cx    ;write the RGB pixel to the screen

Yes, I know its not the best implementation, but I think it shows how you have to split the RGB and deal with them separately. And you have to either push & pop, or you run out of registers. It's disgusting. You could tweak this to be a bit better, but it's still yuksei. I tried a better way.

This way I'd been thinking of for a while (ever since I started RGB color coding), but never thought it would work for some reason. But I just tried it, and since it works I'm writing this document.

It is cooler because it only uses 2 registers - EAX and EBX. Nothing else, and no stack. It's also cleaner and uses 32 bit for some code. Freeing up registers is important because then you can optimise the rest of your inner-loop A LOT. Basically I thought that when you go:

        mov ax,[edi]
        mov bx,[esi]

Then why couldn't you just go:

        add ax,bx

And that would add the pixel colours together. Well, it doesn't work because overflows occur and the blue might seep into the green, or the green seep into the red. So you do have to work with the R G B seperatly. So here is what I thought of:

If you have the pixel1 color in AX, then EAX would contain zeros and then the data. If you have the pixel2 color in BX, then EBX would contain zeros and then the data. So if you could somehow use that zero area in the registers as a buffer zone to spread the RGB values out across, you could add the colors together without the red, green or blue interfering with each other. Understand? Hehe.

Ok, here is EAX, and EBX right after you read the pixels into them:

fedcba9876543210fedcba9876543210
0000000000000000RRRRRGGGGGGBBBBB       <---source pixel1 EAX
0000000000000000RRRRRGGGGGGBBBBB       <---source pixel2 EBX

Now, if you could get it to look like this:

fedcba9876543210fedcba9876543210
000RRRRR00GGGGGG000BBBBB00000000       <---rearranged source pixel1 EAX
000RRRRR00GGGGGG000BBBBB00000000       <---rearranged source pixel2 EBX

Then you could just go:

        add eax,ebx

And there would be no problems. Cool eh? But how do you get it to be rearranged? And then surly once you've added the 2 together, you have to rearrange it back to the standard RGB format? Yes, yes yes. For EAX and EBX we have to perform a transformation on them. Luckily for you I've already had to figure out the transformation(and it was fun), and here it is:

fedcba9876543210fedcba9876543210
0000000000000000RRRRRGGGGGGBBBBB       -- original data
00000RRRRRGGGGGGBBBBB00000000000       -- rol eax,11 ;step1
00000RRRRRGGGGGG00000000000BBBBB       -- shr ax,11  ;step2
00000000000BBBBB00000RRRRRGGGGGG       -- ror eax,16 ;step3
00000000000BBBBB000RRRRRGGGGGG00       -- shl ax,2   ;step4
00000000000BBBBB000RRRRR00GGGGGG       -- shr al,2   ;step5
000RRRRR00GGGGGG00000000000BBBBB       -- rol eax,16 ;step6
000RRRRR00GGGGGG000BBBBB00000000       -- shl ax,8   ;step7

and same thing for EBX:

fedcba9876543210fedcba9876543210
0000000000000000RRRRRGGGGGGBBBBB       -- original data
00000RRRRRGGGGGGBBBBB00000000000       -- rol ebx,11 ;step1
00000RRRRRGGGGGG00000000000BBBBB       -- shr bx,11  ;step2
00000000000BBBBB00000RRRRRGGGGGG       -- ror ebx,16 ;step3
00000000000BBBBB000RRRRRGGGGGG00       -- shl bx,2   ;step4
00000000000BBBBB000RRRRR00GGGGGG       -- shr bl,2   ;step5
000RRRRR00GGGGGG00000000000BBBBB       -- rol ebx,16 ;step6
000RRRRR00GGGGGG000BBBBB00000000       -- shl bx,8   ;step7

Ok, cool so now we have two dwords, ready to add. And they won't overflow. So you can either add them and then divide by 2 for transparency, or you can do a cool thing and add them and clip their maximum range. Like:

        r=r1+r2;
        g=g1+g2;
        b=b1+b2;
        if (r>31) r=31;
        if (g>63) g=63;
        if (b>31) b=31;

                            ;must clip to 31, 63, 31:
           ror eax,8
           ;---> 0000000000RRRRRR0GGGGGGG00BBBBBB
           cmp al,31
            jle @blue_is_cool
            mov al,31
            @blue_is_cool

           ror eax,8
           ;---> 00BBBBBB0000000000RRRRRR0GGGGGGG
           cmp al,63
            jle @green_is_cool
            mov al,63
            @green_is_cool

           ror eax,8
           ;---> 0GGGGGGG00BBBBBB0000000000RRRRRR
           cmp al,31
            jle @red_is_cool
            mov al,31
            @red_is_cool

Cool, so now we have:

000RRRRR00GGGGGG000BBBBB00000000         ;in EAX

And we need to convert it back to normal RGB format. Which is just as easy as:

fedcba9876543210fedcba9876543210
000BBBBB00000000000RRRRR00GGGGGG    --- rol eax,8      ;1
000BBBBB00000000000RRRRRGGGGGG00    --- shl al,2       ;2
00000000000RRRRRGGGGGG00000BBBBB    --- rol eax,8      ;3
00000000000RRRRRGGGGGG00BBBBB000    --- shl al,3       ;4
0000000000000RRRRRGGGGGG00BBBBB0    --- ror eax,2      ;5
0000000000000RRRRRGGGGGGBBBBB000    --- shl al,2       ;6
0000000000000000RRRRRGGGGGGBBBBB    --- shr eax,3      ;7

and now we just go:

        mov [edi],ax        ;putpixel

So here is a complete example:

            xor eax,eax
            xor ebx,ebx
            mov ax,[esi]        ;getpixel spritemap
            mov bx,[edi]        ;getpixel target for transparency layering

            ;--------------------------------------

            rol eax,11 ;1       ;conversion 1
            shr ax,11  ;2
            ror eax,16 ;3
            shl ax,2   ;4
            shr al,2   ;5
            rol eax,16 ;6
            shl ax,8   ;7

            rol ebx,11 ;1       ;conversion 2
            shr bx,11  ;2
            ror ebx,16 ;3
            shl bx,2   ;4
            shr bl,2   ;5
            rol ebx,16 ;6
            shl bx,8   ;7

            ;--------------------------------------

            add eax,ebx         ;add them together!!!

            ;--------------------------------------

           ror eax,8
           cmp al,31
            jle @blue_is_cool           ;check if overflow - blue?
            mov al,31
            @blue_is_cool:

           ror eax,8
           cmp al,63
            jle @green_is_cool          ;check if overflow - green?
            mov al,63
            @green_is_cool:

           ror eax,8
           cmp al,31
            jle @red_is_cool            ;check if overflow - red?
            mov al,31
            @red_is_cool:

            ;--------------------------------------

            rol eax,8      ;1           ;convertback
            shl al,2       ;2
            rol eax,8      ;3
            shl al,3       ;4
            ror eax,2      ;5
            shl al,2       ;6
            shr eax,3      ;7

            ;--------------------------------------

            mov [edi],ax                 ;putpixel

It's very speedy, very slick, and it frees up registers. I found that using this technique I could have everything else out of the inner loop, and I just had like 3 more instructions in the inner-loop. Anyways, lemme know what you think. Perhaps this is oldhat.

                                                  -Rawhed/Sensory Overload
                                                  -Mailto:andrew@overload.co.za
                                                  -Http://www.overload.co.za
                                                  -Andrew Griffiths
                                                  -South Africa
                                                  -05-07-1999